Takatsugu ONO Koji INOUE Kazuaki MURAKAMI Kenji YOSHIDA
This paper proposes a software-controllable variable line-size (SC-VLS) cache architecture for low power embedded systems. High bandwidth between logic and a DRAM is realized by means of advanced integrated technology. System-in-Silicon is one of the architectural frameworks to realize the high bandwidth. An ASIC and a specific SRAM are mounted onto a silicon interposer. Each chip is connected to the silicon interposer by eutectic solder bumps. In the framework, it is important to reduce the DRAM energy consumption. The specific DRAM needs a small cache memory to improve the performance. We exploit the cache to reduce the DRAM energy consumption. During application program executions, an adequate cache line size which produces the lowest cache miss ratio is varied because the amount of spatial locality of memory references changes. If we employ a large cache line size, we can expect the effect of prefetching. However, the DRAM energy consumption is larger than a small line size because of the huge number of banks are accessed. The SC-VLS cache is able to change a line size to an adequate one at runtime with a small area and power overheads. We analyze the adequate line size and insert line size change instructions at the beginning of each function of a target program before executing the program. In our evaluation, it is observed that the SC-VLS cache reduces the DRAM energy consumption up to 88%, compared to a conventional cache with fixed 256 B lines.
Shan DING Hiroyuki TOMIYAMA Hiroaki TAKADA
An advanced communication system, the FlexRay system, has been developed for future automotive applications. It consists of time-triggered clusters, such as drive-by-wire in cars, in order to meet different requirements and constraints between various sensors, processors, and actuators. In this paper, an approach to static scheduling for FlexRay systems is proposed. Our experimental results show that the proposed scheduling method significantly reduces up to 36.3% of the network traffic compared with a past approach.
Hamid NOORI Maziar GOUDARZI Koji INOUE Kazuaki MURAKAMI
Energy consumption is a major concern in embedded computing systems. Several studies have shown that cache memories account for 40% or more of the total energy consumed in these systems. Active power used to be the primary contributor to total power dissipation of CMOS designs, but with the technology scaling, the share of leakage in total power consumption of digital systems continues to grow. Moreover, temperature is another factor that exponentially increases the leakage current. In this paper, we show the effect of temperature on the optimal (minimum-energy-consuming) cache configuration for low energy embedded systems. Our results show that for a given application and technology, the optimal cache size moves toward smaller caches at higher temperatures, due to the larger leakage. Consequently, a Temperature-Aware Configurable Cache (TACC) is an effective way to save energy in finer technologies when the embedded system is used in different temperatures. Our results show that using a TACC, up to 61% energy can be saved for instruction cache and 77% for data cache compared to a configurable cache that has been configured for only the corner-case temperature (100). Furthermore, the TACC also enhances the performance by up to 28% for the instruction cache and up to 17% for the data cache.
Shuichi ICHIKAWA Takashi SAWADA Hisashi HATA
By diversifying processor architecture, computer software is expected to be more resistant to plagiarism, analysis, and attacks. This study presents a new method to diversify instruction set architecture (ISA) by utilizing the redundancy in the instruction set. Our method is particularly suited for embedded systems implemented with FPGA technology, and realizes a genuine instruction set randomization, which has not been provided by the preceding studies. The evaluation results on four typical ISAs indicate that our scheme can provide a far larger degree of freedom than the preceding studies. Diversified processors based on MIPS architecture were actually implemented and evaluated with Xilinx Spartan-3 FPGA. The increase of logic scale was modest: 5.1% in Specialized design and 3.6% in RAM-mapped design. The performance overhead was also modest: 3.4% in Specialized design and 11.6% in RAM-mapped design. From these results, our scheme is regarded as a practical and promising way to secure FPGA-based embedded systems.
Hiroyuki TOMIYAMA Shin-ichiro CHIKADA Shinya HONDA Hiroaki TAKADA
This paper presents an RTOS-based methodology for design and validation of embedded systems. The heart of our methodology is the use of an RTOS simulation model from the very early stage of the system design. A case study with a JPEG decoder application is also presented in order to demonstrate the effectiveness of our methodology.
Embedded systems are used in broad fields. They are one of the indispensable and fundamental technologies in a highly informative society in recent years. As embedded systems are large-scale and complicated, it is prosperous to design and develop a system LSI (Large Scale Integration). The structure of the system LSI has been increasing complexity every year. The degree of improvement of its design productivity has not caught up with the degree of its complexity by conventional methods or techniques. Hence, an idea for the design of a system LSI which has the flow of describing specifications of a system in UML (Unified Modeling Language) and then designing the system in a system level language has already proposed. It is important to establish how to convert from UML to a system level language in specification description or design with the idea. This paper proposes, extracts and verifies transformation rules from UML to SpecC which is one of system level languages. SpecC code has been generated actually from elements in diagrams in UML based on the rules. As an example to verify the rules, "headlights control system of a car" is adopted. SpecC code has been generated actually from elements in diagrams in UML based on the rules. It has been confirmed that the example is executed correctly in simulations. By using the transformation rules proposed in this paper, specification and implementation of a system can be connected seamlessly. Hence, it can improve the design productivity of a system LSI and the productivity of embedded systems.
Shinya HONDA Takayuki WAKABAYASHI Hiroyuki TOMIYAMA Hiroaki TAKADA
With the growing design complexity of contemporary embedded systems, real-time operating systems (RTOSs) have become one of important components of such complex embedded systems. This paper presents an RTOS-centric hardware/software cosimulator which we have developed for embedded system design. One of the most remarkable features in our cosimulator is that it has a complete simulation model of an RTOS which is widely used in industry, so that application tasks including RTOS service calls are natively executed on a host computer. Our cosimulator also features cosimulation with functional simulation models of hardware written in C/C++ and cosimulation with HDL simulators. A case study with a JPEG decoder application demonstrates the effectiveness of our cosimulator.
Energy consumption is one of the most critical constraints in the design of portable embedded systems. This paper describes an empirical study about the impacts of compiler optimizations on the energy consumption of the address bus between processor and instruction memory. Experiments using a number of real-world applications are presented, and the results show that transitions on the instruction address bus can be significantly reduced (by 85% on the average) by the compiler optimizations together with bus encoding.
Motoki KIMURA Morgan Hirosuke MIKI Takao ONOYE Isao SHIRAKAWA
A Java execution environment is implemented, in which a hardware engine is operated in parallel with an embedded processor. This pair of hardware facilities together with an additional software kernel are devised for existing embedded systems, so as to execute Java applications more efficiently in such a way that 39 instructions are added to the original Java Virtual Machine to implement the software kernel. The exploration of design parameters is also attempted to attain a low hardware cost and high performance. The proposed hardware engine of a 6-stage pipeline can be integrated in a single chip using 30 k gates together with the instruction and data cache memories. The proposed approach improves the execution speed by a factor of 5 in comparison with the J2ME software implementation.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose an instruction encoding scheme to reduce power consumption of instruction ROMs. The power consumption of the instruction ROM strongly depends on the switching activity of bit-lines due to their large load capacitance. In our approach, the binary-patterns to be assigned as op-codes are determined based on the frequency of instructions in order to reduce the number of bit-line dis-charging. Simulation results show that our approach can reduce 40% of bit-line switchings from a conventional organization.
Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
The hardware-software codesign of distributed embedded systems is a more challenging task, because each phase of codesign, such as copartitioning, cosynthesis, cosimulation, and coverification must consider the physical restrictions imposed by the distributed characteristics of such systems. Distributed systems often contain several similar parts for which design reuse techniques can be applied. Object-oriented (OO) codesign approach, which allows physical restriction and object design reuse, is adopted in our newly proposed Distributed Embedded System Codesign (DESC) methodology. DESC methodology uses three types of models: Object Modeling Technique (OMT) models for system description and input, Linear Hybrid Automata (LHA) models for internal modeling and verification, and SES/workbench simulation models for performance evaluation. A two-level partitioning algorithm is proposed specifically for distributed systems. Software is synthesized by task scheduling and hardware is synthesized by system-level and object-oriented techniques. Design alternatives for synthesized hardware-software systems are then checked for design feasibility through rapid prototyping using hardware-software emulators. Through a case study on a Vehicle Parking Management System (VPMS), we depict each design phase of the DESC methodology to show benefits of OO codesign and the necessity of a two-level partitioning algorithm.
Jih-Ming FU Trong-Yen LEE Pao-Ann HSIUNG Sao-Jie CHEN
Most of current codesign tools or methodologies only support validation in the form of cosimulation and testing of design alternatives. The results of hardware-software codesign of a distributed system are often not verified, because they are not easily verifiable. In this paper, we propose a new formal coverification approach based on linear hybrid automata, and an algorithm for automatically converting codesign results to the linear hybrid automata framework. Our coverification approach allows automatic verification of real-time constraints such as hard deadlines. Another advantage is that the proposed approach is suitable for verifying distributed systems with arbitrary communication patterns and system architecture. The feasibility of our approach is demonstrated through several application examples. The proposed approach has also been successfully used in verifying deadline violations when there are inter-task communications between tasks with different period lengths.
Barry SHACKLEFORD Mitsuhiro YASUDA Etsuko OKUSHI Hisao KOIZUMI Hiroyuki TOMIYAMA Akihiko INOUE Hiroto YASUURA
Entire systems embedded in a chip and consisting of a processor, memory, and system-specific peripheral hardware are now commonly contained in commodity electronic devices. Cost minimization of these systems is of paramount economic importance to manufactures of these devices. By employing a variable configuration processor in conjunction with a multi-precision compiler generator, we show that there are situations in which considerable system cost reduction can be obtained by synthesizing a CPU that is narrower than the largest variable in the application program.
The paper describes a novel 32-bit RISC microprocessor architecture for embedded systems. Variable-length instructions of 16, 32 or 48 bits provide compact code since the majority of instructions are 16 bits in length. The basic instruction format of 16 bits allows only 2 register adresses of 5 bits each; however, it is shown that the overhead in the instruction count is only between 14% and is far outweighed by the savings in program size. The register set provides addressing of 16 global and up to max. 16 local registers per stack frame in a register stack of 64 registers. The stack frames are of variable length with a variable overlap for parameter passing. A load/store architecture is used; memory accesses are pipelined. Nearly all instructions execute in a single cycle. A two-stage pipeline (decode/execute) minimizes wait cycles after pipeline breaks due to branches. An instruction cache of 128 bytes employs an efficient look-ahead algorithm and is quickly updated in case of a cache miss. The µP is implemented in 1.2 µm CMOS on a die of 47 mm2. Power dissipation is only 0.5 W. The development environment is PC-based.